-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding SDF LevelSet #187
Adding SDF LevelSet #187
Conversation
This reverts commit ba2348f.
So if I have a random glb I may be possible to test this? Do I need to generate sdfs in some form? What form is that? |
There's documentation in sdf.h for the SDF class. Any function that takes a vec3 and returns a float will do. Turning an arbitrary GLB into an SDF is pretty non-trivial - that's not really what it's for. It's about creating complex surfaces from mathematical functions. Try out the sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments about readability and use of atomics. Haven't yet look into details of the algorithmic part.
src/sdf/include/sdf.h
Outdated
#else | ||
std::atomic<Uint64>& tar = reinterpret_cast<std::atomic<Uint64>&>(target); | ||
Uint64 old_val = tar.load(); | ||
tar.compare_exchange_weak(compare, val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compare_exchange
may fail. It will fail when there are interleaved CAS (the latter one will fail), and will fail spuriously when compare_exchange_weak
is used (like when the cache line is touched, as it can be implemented with better efficiency in some architecture like arm with exclusive monitor).
Generally, when you just want to update the value atomically, you have to do it in a loop, repeat the CAS until the return value is true (updated successfully).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm still a little fuzzy on atomics. I've been impressed how fast they are in CUDA even when hit pretty often. I'll change that for the CPU side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it seems that the implementation of atomic in CUDA is different from that of CPU. Actually, the slow thing about atomic is about cache update across cores, I guess this is a bit different in GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I've just read about this now. It sounds likes weak means I can't rely on return == compare to guarantee a swap has taken place. I could probably tweak it to make use of the boolean return instead, but I think there's no big problem going with strong.
} | ||
|
||
__host__ __device__ Uint64 SpreadBits3(Uint64 v) { | ||
v = v & 0x1fffff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These magic numbers look really magical, it would be nice if you can explain how/where did you get them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have similar ones in the collider; Morton codes are pretty standard and various versions of the magic numbers can be found in algorithm reference books.
uint32_t idx = vert.key & (Size() - 1); | ||
while (1) { | ||
Uint64& key = table_[idx].key; | ||
const Uint64 found = AtomicCAS(key, kOpen, vert.key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the implementation of a concurrent hashtable, I'm a bit worried about the performance of having atomic operations in a loop. Atomic operations are a lot slower than typical instructions, and the cost is still a lot higher even when you only have 1 thread running.
If writes are not common, I think a lock might give you better performance. If writes are common, I think I can have a look and see if there is a way to reduce the use of atomic operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual I was focused more on the GPU, where locks aren't much of an option. I'm totally open to recommendations though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can implement a spin lock with atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that atomics are mostly slow when concurrent access is actually happening; do they have much performance impact otherwise? I would expect very little concurrency on these - they will only happen during a hash collision (which is already rare) that also happens to be concurrent. I made the step large so the nearby threads don't tend to step on each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they have performance impact even when there is only 1 thread running these atomic instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know, thanks. I'll leave it this way for now, but you're welcome to optimize it - that will be a good learning experience for me.
inline __host__ __device__ void operator()(Uint64 mortonCode) { | ||
const glm::ivec4 gridIndex = DecodeMorton(mortonCode); | ||
|
||
if (glm::any(glm::greaterThan(glm::ivec3(gridIndex), gridSize))) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just noticed that you have some glm::any(glm::some_comparison(...))
patterns. Maybe we can define some helper functions to make it easier to read? Like anyGreaterThan
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's potentially a lot of permutations, I think I'm going to leave that for now.
src/sdf/include/sdf.h
Outdated
gridVert.distance = BoundedSDF(gridIndex); | ||
|
||
bool keep = false; | ||
float minDist2 = 0.25 * 0.25; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usage of minDist2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that's dead code now. I had a cool idea, but realized it would create edge cases that could break manifoldness.
src/sdf/include/sdf.h
Outdated
tet[2] = thisVert.Inside(); | ||
int edges[6] = {base.edgeVerts[0], -1, -1, -1, -1, -1}; | ||
for (const int i : {0, 1, 2}) { | ||
edges[1] = base.edgeVerts[i + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is relatively complicated, maybe we can reorder some statements and avoid reusing the array to make it more readable?
for (const int i : {0, 1, 2}) {
thisIndex = leadIndex;
thisIndex[Prev3(i)] -= 1;
GridVert nextVert = gridVerts[MortonCode(thisIndex)];
tet[3] = nextVert.Inside();
int edges1[6] = {
base.edgeVerts[0],
base.edgeVerts[i+1],
nextVert.edgeVerts[Next3(i) + 4],
nextVert.edgeVerts[Prev3(i) + 1],
thisVert.edgeVerts[i + 4],
base.edgeVerts[Prev3(i) + 4]
};
thisVert = nextVert;
if (!skipTet && thisVert.key != kOpen) CreateTris(tet, edges1);
skipTet = thisVert.key == kOpen;
tet[2] = tet[3];
int edges2[6] = ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I personally think that it is easy to get lost reading many array element operations, especially with some order dependencies with other variables)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this has been cleaned up a lot; thanks for the suggestion, I ended up using that and a few other things too.
Codecov Report
@@ Coverage Diff @@
## master #187 +/- ##
==========================================
+ Coverage 98.15% 98.18% +0.02%
==========================================
Files 30 33 +3
Lines 2602 2919 +317
==========================================
+ Hits 2554 2866 +312
- Misses 48 53 +5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I've simplified the API; I think this is pretty much ready to merge. |
* polymorphic device functions working * starting to flesh out algorithm * getting closer * fleshed out vert computation * more progress * BuildTris * fixed some syntax * updating to cpp * added VecDc * SDF compiles * SDF runs * fixed some bugs * manifold output * fixed out of memory bug * flattened sides * back to egg-crate * works with CUDA too * moved SDF * Revert "moved SDF" This reverts commit ba2348f. * added gyroid sample * added sdf_test * added SDF tests * tests pass * updated gyroid_module * fixed gyroid module * added docs * addressing feedback * fix cuda detection * clean up detect cuda * fixed CUDA failure * consistent execution policy in Boolean * remove SDF wrapper class * simplified tet logic * use NeighborInside() * optimized SDF * moved RemoveUnreferencedVerts * put printf statements behind MANIFOLD_DEBUG * enable hashtable resize * added test for hashtable resizing
Fixes #129
I've implemented my own take on Marching Cubes for making manifold meshes from signed-distance functions. It's a form of Marching Tetrahedra on a body-centered cubic grid, which ensures all tetrahedra are identical, maximum quality, and symmetric. It also ensures all output is manifold without needing to deal with edge cases.
I've implemented this for the GPU (of course it works on the CPU as well) because the computation can get pretty intense. The interface is intended to mimic Thrust with its use of functors.
These new tests also unearthed an existing and very tricky bug, whose fix is also included here.